Why is English so easy to segment?

نویسندگان

  • Abdellah Fourtassi
  • Benjamin Börschinger
  • Mark Johnson
  • Emmanuel Dupoux
چکیده

Cross-linguistic studies on unsupervised word segmentation have consistently shown that English is easier to segment than other languages. In this paper, we propose an explanation of this finding based on the notion of segmentation ambiguity. We show that English has a very low segmentation ambiguity compared to Japanese and that this difference correlates with the segmentation performance in a unigram model. We suggest that segmentation ambiguity is linked to a trade-off between syllable structure complexity and word length distribution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handicrafts, Encountering Modern Technology

This article aimes to emphasize certain points concerning traditional art , and to put forward a question. As the term” Traditional art” is rather ambiguous, first we try to clarify it. To do so, we propose an approach somehow different from one generally admitted. Thereby, we discuss the reasons why it is not so easy to give a definition of traditional art, islamic art in particular, specially...

متن کامل

A Comparative Study of "From 7 O`clock to 9:30" by Na-albandian and "Erostratus" by Sartre by Applying Bakhtinian Dialogism

The present study is conducted to compare 'From 7 o`clock to 9:30' by Abbas Na-albandian and 'Erostratus' by Jean-Paul Sartre based on Bakhtin’s intertextual approach with an eye on Bakhtin’s notion of dialogic imagination. Bakhtin in his approach focuses on the text, rather than the author, and that is why the approach is so demanding at present time. Also his dialogism refers to the fact that...

متن کامل

Why we need to read and understand literature: literariness and Hans Rosling’s Factfulness (2018)

My article addresses the qualities of “good” literature and how an understanding of the nature of literary devices, so-called “literariness”, can enhance the reading experience. Focusing on Hans Rosling’s Factfulness (2018), I discuss some of the most important features of good writing. Six literary devices have been selected for special attention: point of view, tone, amplification, anecdotes,...

متن کامل

The two be's of English

This  qualitative  study  investigates  the  uses  of  be  in  Contemporary  English.  Based  on  this  study, one  easy  claim  and  one  more  difficult  claim  are  proposed.  The  easy  claim  is  that  the  traditional distinction between be as a lexical verb and be as an auxiliary is faulty. In particular, 'copular-be', traditionally considered to be a lexical verb, is in fact a prototypi...

متن کامل

Wolves and Big Yellow Taxis: How Would Be Know If the NHS Is at Death’s Door? Comment on “Who killed the English National Health Service?”

Martin Powell suggests that the death of the English National Health Service (NHS) has been announced so many times we are at risk of not noticing should it actually happen. He is right. If we ‘cry wolf’ too many times, we risk losing sight of what is important about the NHS and why.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013